Multilingual Subjectivity and Sentiment Analysis
نویسندگان
چکیده
Subjectivity and sentiment analysis focuses on the automatic identification of private states, such as opinions, emotions, sentiments, evaluations, beliefs, and speculations in natural language. While subjectivity classification labels text as either subjective or objective, sentiment classification adds an additional level of granularity, by further classifying subjective text as either positive, negative or neutral. While much of the research work in this area has been applied to English, research on other languages is growing, including Japanese, Chinese, German, Spanish, Romanian. While most of the researchers in the field are familiar with the methods applied on English, few of them have closely looked at the original research carried out in other languages. For example, in languages such as Chinese, researchers have been looking at the ability of characters to carry sentiment information (Ku et al., 2005; Xiang, 2011). In Romanian, due to markers of politeness and additional verbal modes embedded in the language, experiments have hinted that subjectivity detection may be easier to achieve (Banea et al., 2008). These additional sources of information may not be available across all languages, yet, various articles have pointed out that by investigating a synergistic approach for detecting subjectivity and sentiment in multiple languages at the same time, improvements can be achieved not only in other languages, but in English as well. The development and interest in these methods is also highly motivated by the fact that only 27% of Internet users speak English (www.internetworldstats.com/stats.htm, Oct 11, 2011), and that number diminishes further every year, as more people across the globe gain Internet access. The aim of this tutorial is to familiarize the attendees with the subjectivity and sentiment research carried out on languages other than English in order to enable and promote crossfertilization. Specifically, we will review work along three main directions. First, we will present methods where the resources and tools have been specifically developed for a given target language. In this category, we will also briefly overview the main methods that have been proposed for English, but which can be easily ported to other languages. Second, we will describe cross-lingual approaches, including several methods that have been proposed to leverage on the resources and tools available in English by using cross-lingual projections. Finally, third, we will show how the expression of opinions and polarity pervades language boundaries, and thus methods that holistically explore multiple languages at the same time can be effectively considered.
منابع مشابه
Multilingual Sentiment and Subjectivity Analysis
Subjectivity and sentiment analysis focuses on the automatic identification of private states, such as opinions, emotions, sentiments, evaluations, beliefs, and speculations in natural language. While subjectivity classification labels text as either subjective or objective, sentiment classification adds an additional level of granularity, by further classifying subjective text as either positi...
متن کاملMHSubLex: Using Metaheuristic Methods for Subjectivity Classification of Microblogs
In Web 2.0, people are free to share their experiences, views, and opinions. One of the problems that arises in web 2.0 is the sentiment analysis of texts produced by users in outlets such as Twitter. One of main the tasks of sentiment analysis is subjectivity classification. Our aim is to classify the subjectivity of Tweets. To this end, we create subjectivity lexicons in which the words into ...
متن کاملExploring Sentiment in Social Media: Bootstrapping Subjectivity Clues from Multilingual Twitter Streams
We study subjective language in social media and create Twitter-specific lexicons via bootstrapping sentiment-bearing terms from multilingual Twitter streams. Starting with a domain-independent, highprecision sentiment lexicon and a large pool of unlabeled data, we bootstrap Twitter-specific sentiment lexicons, using a small amount of labeled data to guide the process. Our experiments on Englis...
متن کاملSentiment in Social Media: Bootstrapping Subjectivity Clues from Multilingual Twitter Streams and Exploiting Gender Language Differences on Twitter
We study subjective language in social media and create Twitter-specific lexicons via bootstrapping sentiment-bearing terms from multilingual Twitter streams. Starting with a domain-independent, highprecision sentiment lexicon and a large pool of unlabeled data, we bootstrap Twitter-specific sentiment lexicons, using a small amount of labeled data to guide the process. Our experiments on Englis...
متن کاملSentiment analysis system adaptation for multilingual processing: The case of tweets
Nowadays opinion mining systems play a strategic role in different areas such as Marketing, Decision Support Systems or Policy Support. Since the arrival of the Web 2.0, more and more textual documents containing information that express opinions or comments in different languages are available. Given the proven importance of such documents, the use of effective multilingual opinion mining syst...
متن کامل